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h. _ Introduction 

Among  approaches  to  the  estimation  of  signals,  or  signal  parameters, 
in  the  presence  of  noise,  the  following  can  be  distinguished: 

a)  Approaches  in  which  a-priori  statistics  are  not  associated  with 
the  parameters  to  be  estimated,  versus  approaches  in  which  a-priori 
statistics  are  associated  with  the  unknown  parameters. 

Typical  methods  utilized  in  connection  with  the  former  approach 

/  1  O  \  /n  /\ 

are  maximum  likelihood,  ’  minimum  variance  unbiased,  ^  ’  '  and 

(3  4) 

use  of  Cramer-Rao,  Barankin,  or  other  lower  bounds.  ’ 

In  cases  where  a-priori  statistics  are  associated  with  the 
unknown  parameters,  typical  approaches  involve  a-posteriori  proba¬ 
bility^  and  decision  theoretic  methods. 

b)  Approaches  in  which  the  signals  are  regarded  as  deterministic 

except  for  some  set  of  initially  unknown  parameters,  typical  of  which 

(1  2) 

are  most  treatments  of  the  estimation  of  radar  waveform  parameters, 
versus  approaches  in  which  the  signals  are  regarded  as  random 

(1  81 

processes,  as  in  Wiener  filtering  theory  and  its  generalizations.  ’ 

One  may  also  distinguish  cases  in  which  the  number  of  initially 
unknown  parameters  is  finite  (as  in  most  radar  waveform  estimation 
applications),  and  those  in  which  there  is  an  infinite  set  of  initially 

Any  views  expressed  in  this  paper  are  those  of  the  author.  They 
should  not  be  interpreted  as  reflecting  the  views  of  The  RAND  Corporation 
or  the  official  opinion  or  policy  of  any  of  its  governmental  or  private 
research  sponsors.  Papers  are  reproduced  by  The  RAND  Corporation  as  a 
courtesy  to  members  of  its  staff. 
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unknown  parameters  (examples  of  which  will  be  given  in  this  paper). 

The  case  in  which  the  signals  are  regarded  as  random  processes 
can  be  considered  equivalent  to  that  in  which  there  is  an  infinite 
set  of  unknown  parameters  (since  the  signal  process  can  be  represented 
in  terms  of  such  a  set)  and  in  which  a-priori  statistics  are  associated 
with  these  parameters.  In  cases  where  there  are  only  a  finite  number 
of  unknown  parameters,  but  a-priori  statistics  are  associated  with 
them,  the  signal  can  also  be  regarded  as  a  random  process,  albeit 
one  whose  sample  space  is  finite  dimensional. 

Section  II  of  this  paper  is  devoted  to  treating  these  various 
cases  on  a  unified  basis.  In  Section  II.  1,  it  is  shown  that  the 
Maximum  A-Posteriori  (MAP)  estimate  for  the  case  where  a-priori 
statistics  are  associated  with  the  unknown  parameters  is  under  certain 
conditions  equivalent  to  a  Maximum  Likelihood  (ML)  estimate  in  an 
equivalent  problem  in  which  the  a-priori  statistics  associated  with 
the  parameters  are  regarded  as  providing  additional  equivalent 
observations  of  the  parameters  (this  is  not  the  same  as  the  well 
known  fact  that  the  ML  estimate  is  equal  to  the  MAP  estimate  if  the 
a-priori  pdf's  of  the  parameters  are  uniform). 

We  also  define  mixed  ML-MAP  estimates  for  cases  in  which  some 
but  not  all  of  the  parameters  are  assumed  to  ha^e  a-priori  statistical 
distributions;  equivalent  formulations  are  then  given  where  these  same 
estimates  appear  as  pure  ML  estimates  or  alternatively  as  pure  MAP 


estimates . 
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In  Section  II.  2,  attention  is  turned  to  a  class  of  generalized 
least  squares  estimation  procedures  which  give  the  ML-MAP  estimates 
if  the  statistics  are  Gaussian,  but  which  remain  good  estimates  even 
for  non-Gaussian  statistics.  These  are  formulated  for  the  case  where 
a-priori  statistics  associated  with  the  unknown  parameters,  or  with  some 
subset  thereof,  are  regarded  as  providing  the  equivalent  of  additional 
observations,  and  a  first  order  error  analysis  is  given,  from  which 
the  estimation  error  statistics  are  then  derived.  A  result  is  also 
proved  according  to  which  parameters  having  a-priori  statistics  can 
be  under  certain  conditions  considered  equivalent  to  additional  noise, 
insofar  as  concerns  estimation  of  other  parameters. 

The  foregoing  analysis  is  then  applied  in  Section  II.  3  to  linear 
minimum  variance  filtering  theory,  which,  as  is  shown,  can  be  regarded 
as  an  application  of  parameter  estimation  with  an  infinite  number  of 
parameters,  with  a-priori  statistics  associated  with  them,  the  a-priori 
statistics  in  turn  being  regarded  as  providing  the  equivalent  of 
additional  observational  data. 

In  Section  II.  3  we  also  apply  the  results  of  Section  II.  2  to 
prove  that  additive  noise  may  itself  be  considered  to  contribute  an 
additional  infinity  of  parameters  to  be  estimated  jointly  with  the 
signal  parameters  (the  noise  statistics  being  considered  to  contribute 
an  infinite  number  of  additional  equivalent  "observations").  The 
signal  parameter  estimates  are  shown  to  be  in  certain  cases  the  same, 
whether  the  noise  is  considered  as  noise  or  as  an  additional  number 
of  parameters  to  be  jointly  estimated  with  the  signal  parameters. 


Section  III  is  devoted  to  the  application  of  the  results  of 
Section  II  to  the  problem  of  obtaining  recursive  solutions  to  certain 
very  general  forms  of  the  signal  estimation  problem.  These  recursive 
solutions  are  of  the  type  introduced  by  Swerling^’*^  and  further 
investigated  by  Kalman,  Kalman  and  Bucy,^^  Blum,  and  others. 

Recursive  solutions  are  derived  in  Section  III.  2  for  the  case 
where  the  observed  signal  consists  of  the  sum  of  K  random  processes 
called  '’'signal"  processes  added  to  another  random  process  called  the 
"noise"  process.  The  only  assumption  made  regarding  the  signal 
processes  is  that  they  be  continuous  in  the  mean;  the  only  assumption 
regarding  the  noise  process  is  that  it  consist  of  the  sum  of  two 
components,  one  of  which  is  continuous  in  the  mean,  the  other  being 
white  noise. 

The  problem  is  reduced  to  the  pure  white  noise  case  by  regarding 
the  non-white  noise  component  as  an  additional  process  to  be  estimated, 
and  further  regarding  all  the  signal  processes,  as  well  as  the  non¬ 
white  component  of  the  noise  process,  as  equivalent  to  an  infinite 

number  of  parameters  to  be  jointly  estimated.  The  recursive  techniques 

(9) 

defined  by  Swerling  are  then  applied  directly. 

The  result  is  a  set  of  simultaneous,  non-linear  partial  differential 
equations,  the  solution  to  which  gives  the  desired  recursive  solution 
to  the  optimum  linear  filtering  problem.  However,  this  is  not  as  bad 
as  it  sounds,  since  the  recursive  solution  to  the  filtering  problem 
results  directly  from,  and  is  in  fact  identical  with,  the  process  of 
building  up  the  solutions  to  the  set  of  partial  differential  equations 


from  the  initial  conditions. 
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Section  III.  3  is  devoted  to  extending  these  results  to  cases 
where  sane  components  of  the  noise  may  be  non-additive. 

Section  IV  contains  some  discussion  of  additional  problems  and 
applications  suggested  by  the  results  of  previous  sections. 


There  is  also  an  appendix  which  treats  an  example  of  an  estimation 
p  blem  involving  an  infinite  number  of  unknown  parameters;  in  one 
version,  the  problem  can  be  solved  without  associating  a-priori 
statistic,  with  the  unknown  parameters,  while  a  slightly  modified 
version  cannot  be  properly  formulated  or  solved  without  associating 
a-priori  statistics  with  the  unknown  parameters. 


This  paper  is  not  completely  self-contained,  since  heavy 
reliance  is  placed  on  the  results  of  References  2  and  9.  The  most 
important  formula,  required  are  reproduced,  but  the  discussion  of  a 

number  of  points  i,  abbreviated,  with  reference  being  made  to  further 
discussion  in  the  references. 
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II.  ML,  MAP,  and  Generalized  Least  Squares  Estimates 

II.  1.  Maximum  Likelihood  and  Maximum  A-Posteriorl  Estimates 

Let  S  represent  a  set  of  observational  data,  and  let  x  represent 

a  parameter  (possibly  multiply,  or  even  infinite,  dimensional)  upon 

which  the  probability  distribution  of  S  depends.  Suppose  that  x  has 

* 

an  a-priori  probability  density  function  p(x)  associated  with  it. 

We  will  suppose  that  the  joint  probability  density  function  of  S  and 
x  exists  and  is  denoted  p(S,x).  The  existence  of  the  condi  ' onal 
probability  densities  p(Sjx)  and  p(x|s)is  also  assumed.  Then,  the 
maximum  a-posteriori  (HAP)  estimate  x  of  x  is  that  value  of  x  which 
maximizes  p(x|s).  (In  the  following,  the  symbol  S  will  be  used  to 
denote  both  the  observed  value  of  a  random  variable  and  the  argument 
of  various  pdf's;  also,  the  sympol  p  will  be  used  to  denote  a  variety 
of  pdf's  with,  hopefully,  confusion  avoided  by  the  fact  that  the 
argument  of  p  will  indicate  which  pdf  we  are  talking  about.) 

Now  suppose  that  the  a-priori  pdf  of  x,  p(x),  is  symmetrical 
about  some  point  x: 

p (x)  =  f(x  -  x)  =  f(-x  +  x)  (1) 

(If  p(x)  is  also  unimodal,  then  x  would  be  the  MAP  estimate  of  x 
based  on  just  the  a-priori  information.) 

The  following  equivalent  estimation  problem  can  then  be 
formulated: 

x  will  be  regarded  as  the  observed  value  of  an  additional 

(e) 

"equivalent"  or  "virtual"  observation  Sv  .  This  additional 

*Throughout  the  paper,  pdf's  are  defined  with  respect  to  Lebesgue 
measure  in  finite-dimensional  sample  spaces  Many  results  are  derived 
for  infinite-dimensional  sample  spaces,  but  these  are  always  derived  by 
valid  limiting  processes  from  finite-dimensional  approximations. 


observation  will  be 


represented  as 


>(e) 


x  +  6  S 


(e) 


where 


^  SCe)  =  equivalent 


observation  error. 


As  stated,  the  observed  value  of  the  random  variable  in  any 

case  is  assumed  to  be  x,  so  the  value  of  the  equivalent  observation 
error  is  x  -  x. 

The  random  variable  S(e)  is  assumed  to  have  pdf  characterised 
by  the  conditions: 


E(e)  £g  (e)J  =  x 

pdf  of  6  =  a-priori  p.d.f.  of  x  -  x 


This  amounts  to  saying 


P 


(e) 


In  this  equivalent  problem,  we  regard  x  as  a  constant,  and  S(e) 

as  a  random  variable  with  pdf  given  by  (4).  It  is  also  assumed  that 

the  equivalent  random  variable  6  S(e)  is  statistically  independent 
of  S. 

Now,  the  maximum  likelihood  estimate  of  x  for  this  equivalent 
problem  is  obtained  as  follows.  The  likelihood  function  is 
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p[s,  S(e)  |  x]  =  p[s  |  x]  p(e)[s(e)  |  x]  (5) 

«_  P(x  |  S)  P^  [s<*>  |  xj  q (S) 

p(x) 

where  q(S)  =1  p(S,  x)  d  x 

The  maximum  likelihood  estimate  of  x  is  obtained  by  substituting 

(e) 

the  observed  values  of  S  and  S  into  (5)  and  then  maximizing  with 

(e)  - 

respect  to  x.  However,  the  observed  value  of  S  is  x.  Thus,  when 
—  (e) 

x  is  substituted  for  S  in  (5),  and  use  is  then  made  of  (4)  and 
(1),  we  find  that  the  maximum  likelihood  estimate  of  x  is  obtained 
by  maximizing  p(x  |  S)  and  is  thus  equal  to  the  MAP  estimate  for  the 
original  problem. 

We  can  also  define  mixed  ML-MAP  estimates  for  the  case  where 
some  but  not  all  of  the  unknown  parameters  have  a-priori  statistics. 
Suppose  we  write 

x  =  (u,  v)  (6) 

(each  of  u  and  v  may  be  mul ti- dimensional ) . 

Also  suppose  that  an  a-priori  pdf  p(u  |  v)  is  associated  with 
u  (possibly  the  a-priori  pdf  of  u  depends  on  v  as  a  parameter)  but 
no  a-priori  statistics  are  associated  with  v.  Then,  the  mixed 


ML-MAP  estimate  of  x  is  defined  to  be 
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x  =  ML -MAP  estimate  of  x 

value  which  maximized  p(  S  |  x)  p(u  |  v) 

We  can  define  x  also  as  a  pure  MAP  estimate  by  associating 
with  v  an  a-priori  pdf  p(v)  which  is  uniform  over  an  extremely  wide 
range  of  values.  If  the  resulting  pdf  for  (u,  v)  -  x  is  then 

denoted  p(u,  v)  =  p(x),  then  5  is  the  value  which  maximizes 
p(S  |  x)  p  (x)  . 

*  can  also  be  represented  as  a  pure  Ml  estimate  by  considering 
the  a-priori  statistics  associated  with  u  to  be  equivalent  to 

providing  an  additional  "virtual"  observation  S(e)  whose  observed 
value  is  u  and  whose  pdf  is  given  by 


E(e)  [s(e)  |  x~ 

=  u 

(8) 

P(C)  [S<*>  1  xl 

=  f[s(e)  -  U  | 

v] 

(9) 

=  f[u  -  | 

where  it  has  been  assumed  that 

P(u  |  v)  is  of 

the  form 

p(u  |  v)  =  f[u  - 

u  I  vj  =  f[u  - 

U  |  V 

(10) 
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x  ■  (u,  v)  is  then  obtained  by  maximizing,  with  respect  to 
(u,  v) ,  the  quantity 

p(e)  [s,  S (e)  |  u,  v]  =  p(S  |  u,  v)  p(e)  [s(e)  |  u,  v]  (11) 
-  (e) 

with  the  value  u  substituted  for  S  and  the  actual  observational 
data  substituted  for  S. 

In  the  rest  of  this  paper,  when  we  adjoin  equivalent  observational 
data,  in  the  above-described  manner,  to  the  actual  observed  data,  in 
order  to  obtain  an  ML  estimate  equal  to  the  original  MAP  or  mixed 
ML-MAP  estimate,  we  will  call  these  '’virtual"  or  "equivalent" 
observations  as  distinguished  from  the  "actual"  observations.  Such 
"equivalent  observations"  may  be  regarded  as  the  parameter  estimates 
which  would  be  made  if  there  were  no  actual  observed  data  but  only  a- 
priori  statistics  for  the  parameters;  they  will  generally  be  denoted 
by  a  superscript  "e". 


II.  2.  Generalized  Least  Squares  Estimates 

Attention  will  now  be  turned  to  a  class  of  generalized  least 

(2  9) 

squares  estimates  analyzed  extensively  by  Swerling.  * 

Suppose  the  observational  data  is  given  by 


S  »  f  (x. ,  x. ,  . . . 
\i  p.  1  2 


(12) 


11 


where 


S  ■  observation  (a  real  scalar) 
4 


f  (x,  t  )  =  f  (x  ,  x  ,  .  ..,  t  )  =*  value  of  p^  observation  in  the 
4  p  P  i  4  P 

absence  of  observation  error 

e  =  observation  error  for  the  pC^  observation 
u* 


X  »  (Xj,  x2, 


.)  =  finite  or  infinite  set  of  unknown  parameters 
(real  scalars),  not  yet  assumed  to  have  a- 
priori  statistics. 

th 

t  =  time  of  p  observation  (assumed  known) 


Consider  the  class  of  estimation  procedures  defined  as  follows: 
x^  are  estimates  obtained  by  minimizing,  with  respect  to 
(x^ ,  x2>  ...)  the  quantity 

Q*y  *n  [s  -  f  (x,  t  )"]  r  s  -  f  (x,  t  )"] 

L  p  L  ii  p  ’  p'J  L  u  u  u  J 


(13) 


4  »u 


where 


=  arbitrary  symmetric  positive  definite  matrix 
(not  necessarily  the  inverse  covariance 


matrix  of  {e  };  see  below) 


Before  proceeding,  several  comments  should  be  made: 


a)  The  subscript  p  on  f  indicates  that  the  functional  dependence 

4 

of  the  observations  on  the  unknown  parameters  may  differ  for  each 

observation.  However,  f  is  assumed  to  be  a  known  function.  (If  f 

4  4 

is  not  exactly  known,  this  can  still  be  represented  by  assuming  f^ 

to  be  known  and  compensating  for  this  assumption  by  adding  an 

(9) 

equivalent  observation  error.  '  ) 
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b)  The  functions  f  need  not  depend  explicitly  on  t  .  Also,  t  may 

P  p  p 

be  replaced  by  one  or  more  spatial  or  space-time  parameters,  or  for 
that  matter,  by  an  element  of  an  abstract  parameter  set.  However, 

for  each  observation,  t  (or  the  abstract  parameter  replacing  it)  is 

P 

assumed  known.  If  in  a  practical  case  t  is  not  perfectly  known,  we 
can  still  assume  it  to  be  known  and  compensate  for  this  assumption 
by  adding  an  equivalent  observation  error. 

c)  In  most  cases  of  interest,  one  or  more  components  of  the 
observational  data  may  consist  of  functions  of  a  continuous  time 
(or  other)  parameter,  of  the  form 


S(t)  -  g(t,  xx,  x2,  ...)  +  e  (t) ,  t  e(T1,  t2) 


(14) 


In  such  a  case,  the  term  in  Q  contributed  by  such  observations 

must  be  understood  as  the  limiting  form  of  finite  sums.  Such  limiting 

forms  can  be  well-defined  and  are  often  expressible  as 
(1,2,4,14) 

integrals.  5  ’  ’  (More  precisely,  this  applies  to  those  terms  of 
Q  remaining  after  terms  which  are  independent  of  x  are  subtracted.) 

For  example,  we  could  use  discrete  sampled  times  {t^}  anc*  have 


S  =  S(t  ) 
P  P 


(15) 


f  (x,t  )  =  g(t  ,x) 

p  p'  °  p 


e  =  e(t  ) 

p  p/ 

T)  =  [$(t  ,  t  )~l 
pu  L  p  u'J 


-1 


where  $  is  a  covariance  function  (not  necessarily  that  of  e(t);  see 
below) . 

The  corresponding  term  of  Q  is  the  limit  as  the  set  { t  )  becomes 
dense  in  (t^,  T2). 


i 

I 


I 

i1 

f 


13 


Alternatively  one  can  consider  the  observed  data  to  be  the 
coefficients  of  S(t)  with  respect  to  a  complete  orthonormal  set  of 
functions  £ Y  ( t ) }  over  (t^,  T2): 


■» 

S  =  S(t)  Y  (t)  dt 

^  J  p. 


(16) 


f 

P 


(x) 


T 


2 


r- 


g(t,  x)  Y  ( t )  dt 


(17) 


(This  is  an  example  where  f  does  not  depend  explicitly  on  t  . ) 

Vi  y, 

If  e(t)  is  considered  a  sample  function  of  a  random  process 

with  covariance  function  $,  then  a  convenient  choice  for  W  }  is  the 

Vi 

set  of  orthonormal  eigenfunctions  associated  with  $  over  (t^  ,  T^). 

With  this  understanding,  we  will  continue  to  write  Q  as  a 
double  sum. 

Now,  this  method  of  estimation  results  in  maximum  likelihood 
estimates  under  the  following  two  conditions: 

A.  are  jointly  Gaussian  random  variables. 

B.  [e  }  have  zero  means,  and  (T)  )  is  the  inverse  of  the 

p.  vi,i/ 

covariance  matrix  of  the  variables  [e  }. 

P* 

(For  continuous  time  data,  the  appropriate  limiting  statements  are 
understood. ) 

If  these  conditions  hold,  certain  statements  can  be  made  about 

the  optimality  of  the  resulting  estimates  in  the  sense  that  they  are 

minimum  mean  square  error  estimates,  either  precisely  or  asymptotically 

(see  below).  However,  it  is  worth  emphasizing  that  the  estimates  may 

still  be  good  ones  even  if  one  or  both  of  these  conditions  fail. 

For  example,  if  the  statistics  are  not  Gaussian,  but  condition 

B  is  satisfied,  the  variances  of  ('  -  x^)  are  in  many  cases  not 

affected;  the  only  thing  affected  is  what  type  of  statements  can  be 

(2) 

made  about  the  optimality  of  the  estimates. 
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If  condition  B  is  not  satisfied,  there  will  in  general  be  some 
degradation  in  estimation  accuracy.  However,  it  may  be  desirable 
from  other  points  of  view  to  use  smoothing  matrices  (T]  )  not  satis¬ 

fying  condition  B.  For  example,  it  may  be  convenient  for  computational 
purposes  to  use  a  diagonal  smoothing  matrix  even  if  the  observation 
errors  are  correlated;  or,  one  may  not  know  the  error  covariances 
exactly.  The  increased  computational  convenience  may  be  worth  the 
decrease  in  accuracy.  In  any  event,  the  degradation  in  accuracy 
caused  by  failure  of  condition  B  can  usually  be  computed,  as  is 
described  in  the  section  of  Reference  2  entitled  "Mismatched  Processing 
of  Received  Signals."  Reference  2  also  treats  the  degradation  in 

estimation  accuracy  caused  if  the  functions  f  utilized  in  Q  do  not 

P* 

exactly  describe  the  true  dependence  of  the  observations  (in  the 
absence  of  observation  error)  on  the  parameters  x^,  x^,  .... 

Thus,  we  will  adopt  the  point  of  view  that  the  method  of  mini¬ 
mizing  Q  defines  estimates  which  may  be  good  estimates  whether  or  not 
conditions  A  and  B  are  satisfied;  in  the  special  case  where  they  are 
satisfied,  the  estimates  are  ML. 

Jf  the  errors  e  are  sufficiently  small,  the  estimates  x. 

»  i(2  9) 

resulting  from  minimization  can  be  written  to  first  order  as  ’  : 


x.  -  x.  =  )  v.  (x)  e  +  higher  order  terms 
i  i  L  ip.  M- 


-1 


Vx)  =  I  X  [B(x)l.  ^ 

J  u 


(x,  t  ) 

_ L_ 


u  w  axj 


E 


lj 


(x)  -  l  H 


df(x,  t  )  a  f ,, (x ,  t  ) 


i± _ y- 


V 


p>*  V 


^  3  xt  a  xj 


(18) 


(19) 


(20) 
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It  should  be  emphasized  that  Eqs.  (18)  -  (20)  depend  only  on 
the  definition  of  the  method  of  obtaining  by  minimizing  Q;  they 
do  not  depend  on  any  statistical  interpretation  of  the  quantities 
e  nor  on  whether  conditions  A  and  B  hold.  (In  Eqs.  (18)  -  (20)  and 
other  places  below,  as  will  be  clear  from  the  context,  x  =  (x^,x^,..  ) 
denote  the  true  values  of  the  parameters.) 

If  condition  B  is  satisfied,  and  if  the  higher  order  terms  can 
be  neglected,  then 


-1 

EL(ii-v<v  v] = 

We  will  now  proceed  to  define  a  generalized  least  squares 
smoothing  technique  for  the  case  where  a-priori  statistics  are 
associated  with  some  subset  of  the  parameters  x^,  x^,  .... 

Let  us  suppose  that  I  is  a  subset  of  (1,2,...),  not  necessarily 
a  proper  subset.  Suppose  that  a  joint  a-priori  statistical 
distribution  is  associated  with  those  x^  for  i  e  I. 

We  will  assume  that  Eq.  (12)  defines  the  actual  observations . 
The  generalized  least  squares  estimates  are  obtained  by  adjoining 
to  Q  a  term  corresponding  to  the  "equivalent  observations"  provided 
by  the  a-priori  statistics. 

In  this  case,  Q  is  defined  by 

I  vta  -  v*'  v]L8„  -  v*-  ‘*>] 

p,,U 

+  l  l*k  -  *k]  [*e  -  XJ 

k ,  I 


where 


(§  )  is  an  arbitrary  symmetric  positive  definite  matrix 

x^»  k  e  I  are  arbitrary  constants 
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Of  course,  it  is  clear  that  in  order  to  make  good  use  of  the 
a-priori  information,  and  x^  are  not  going  to  be  completely 
arbitrary;  in  fact,  x^  are  going  to  be  something  like  the  means  of 
the  a-priori  p  d  f's  of  x^,  and  §  are  going  to  be  related  to  the 
a-priori  covariance  matrix  of  {x^  -  x^}.  However,  for  present 
purposes  we  can  regard  any  deviation  of  (§  )  from  the  inverse  co- 

KJ 6 

variance  matrix  of  [x^  -  x^}  as  analogous  to  a  deviation  of  (T^) 
from  the  inverse  covariance  matrix  of  [e  }  in  the  case  of  the  actual 
observations;  and  any  deviation  of  x^  from  the  means  of  the  a-priori 
distributions  as  analogous  to  a  deviation  of  the  functions  f  from 
those  which  truly  describe  the  dependence  of  the  actual  observations 
on  x. 

We  wish  to  apply  Eqs.  (18)  -  (20)  to  obtain  the  first  order 

A 

dependence  of  x^  -  x^  on  {e^}  and  [x^  -  x^},  k  e  I-  To  facilitate 
this,  we  introduce  the  following  notation: 


x  = 


u  = 


u  = 


V  = 


(u,  v) 

{x±},  i  e  I 

[xi),  i  e  I 

{\}>  i  4  I 


(23) 


Then,  to  first  order  in  and  [x^  -  x^},  k  e  I: 


!i  '  Xi  '  I  Vu>  V)  V  +  l  Vu’  v>[\  '  XJ 


kel 


(24) 
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-1 

V’0  =Z  Z[B(x)] 

J  V  ij 


&fu(x,  tB) 


IJ.D 


a  x 


j 


(25) 


-1 


ik 


(x)  =Z  LB(X)1  Sjk*  k  e  I 
jel  'ij 


(26) 


Bij(x) 


z 


T1 


df  (x,  t  ) 


M.u  3  x. 


B  f  (x,  t  )  , 

u  u  * 

- S' - +  p 

a  Xj  Hj 


(27) 


* 


=  l  if  i,  j  e  I 


(28) 


=  0  if  i  (  I  or  j  (  I 


Now,  consider  the  following  conditions: 


A;:  [e^}  are  jointly  Gaussian;  and  [x^),  k  e  I  have  an  a-priori 


joint  Gaussian  pdf. 


B  :  have  means  zero;  {x^},  k  e  I  have  means  x^;  and  [x^  -  x^}, 

k  e  I  are  a-priori  uncorrelated  with  je  j-  Also,  (e  }  have  inverse 
covariance  matrix  (H  ) ,  and  {x^  -  x^},  k  e  I  have  inverse  covariance 
matrix  (§,  „).  For  the  sake  of  simplicity,  (§,  )  is  also  assumed  to 

KjL  K  Xj 

be  independent  of  {x^},  i  j;  I. 


If  A.'  and  B7  .re  satisfied,  x.  are  the  ML-MAP  estimates  (for 

l 


all  i). 
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We  will  now  use  Eqs.  (24)  -  (28)  to  determine  the  covariance 
matrix  E  (x^  -  x^, )  (x^  -  x  )J  °f  the  estimation  errors  supposing  that 
just  B/  holds.  It  is  also  assumed  that  the  first  order  equation  (23) 
is  valid  (i.e.,  that  higher  order  terras  are  negligible). 

The  covariance  matrix  of  the  estimation  errors  x_^  -  x^  will  be 
computed  with  respect  to  the  statistical  ensemble  defined  by  the 
statistics  of  the  actual  observation  errors  {e^}  and  the  a-priori 
statistics  of  {x^} ,  1  e  L  This  can  be  done  simply  by  forming  the 
products  (x^  -  x^)  (Xj  *  xj)  from  Etls-  (24)  -  (28),  taking  expected 
values  with  respect  to  the  joint  p.d.f.  of  [e  },  and  then  taking  the 
expected  value  with  respect  to  the  a-priori  joint  p  d  f  of  [x^,  i  e  I. 

The  result  is  the  following: 


e[ (xi  -  XjMXj  "  xj>]  =  [B(u,  V)J 


An  important  special  case  is  that  where  B  is  independent  of  x, 
in  which  case  the  right  side  of  Eq.  (29)  becomes  simply  B^  .  Many 
important  applications  fall  into  this  category. 

The  above  equations  also  clarify  the  sense  in  which  the  adjoining 
of  "equivalent"  observations  is  actually  equivalent  to  the  a-priori 
statistics . 

Let  the  expected  value  of  (x^  -  x^)(x^  -  Xj),  given  that  the 
true  value  is  x,  with  respect  to  the  (fictitious)  statistical  ensemble 
defined  by  the  actual  observations  and  equivalent  observations,  with 
x  regarded  as  a  constant,  be  denoted  by  E^  j^(x^  -  x^)  (x^  -  x^)  j  x  . 
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Then,  from  Eq.  (21),  to  first  order, 


Thus,  Eq.  (29)  says  that,  provided  the  first  order  expansion 
holds , 


=  E 


<e)[? 


xt)  (Xj 


It  is  also  true  that,  to  first  order, 

E(e)[(Xi  -  Xi)(x.  -  X.)  I  u,  v] 

=  J  "  xj)  I  u>  v]  p(u)  du 


(30) 


(31) 


(32) 


Thus,  provided  the  first  order  expansions  are  valid,  the 
"equivalence"  of  the  a-priori  statistics  to  the  adjoining  of 
"equivalent"  observations  applies  not  only  in  the  sense  that  the 
MAP  estimates  in  the  original  problem  are  equal  to  the  ML  estimates 
in  the  equivalent  problem  (which  holds  regardless  of  the  validity  of 
the  first  order  expansions),  but  also  in  that  the  estimation  error 
covariance  matrix  for  the  original  problem  can  be  derived  from  that 
for  the  equivalent  problem  via  Eq.  (31)  or  Eq.  (32). 

In  the  case  where  the  matrix  B  is  independent  of  x,  then  so  also 


are  both  sides  of  Eq.  (31). 
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Another  useful  form  of  the  first  order  equations  for  can  be 
stated  in  the  case  where  all  components  of  x  have  a-priori  statistics, 
i . e . ,  where  x  ■  u. 

From  Eqs.  (20),  (21),  (23)  and  (25)  of  Ref.  9  one  can  state  that, 
to  first  order, 


A 


I 


(X) 


(33) 


P*  (x) 


-I 

P'1  u 


S  f  (x,  t  ) 
p.  P- 


B  x 


j 


r 

v 


(x, 


t  ) 

V) 


(34) 


r  (x,  t  )  *  S  -  f  (x,  t  ) 

o  U  u  u  U 


(35) 


Attention  will  now  be  turned  to  the  statement  of  a  result 
according  to  which,  in  some  cases,  parameters  having  a-priori 
statistics  can  be  considered  equivalent  to  additive  noise,  insofar 
as  concerns  estimation  of  other  parameters;  or  conversely,  noise  can 
be  considered  to  be  represented  by  additional  parameters  to  be 
estimated. 

Our  initial  statement  of  this  result  can,  however,  be  stated  in 
a  form  which  does  not  involve  statistical  concepts: 

Let 


x  -  (u,  v) 


(36) 
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S  »  f  (u,  v,  t  )  +  e  =  g  (u,  t  )  +  h  (v ,  t  )  +  e(t  ) 
U  U  Up.  6U  U  U  U  U 


(37) 


Q(u,  v)  =  L  IS  -  g  (u ,  t  )  -  h  (v ,  t  )- 

,,  L  u  u  u  u  u 

M*  *  -• 


(38) 


I_s,  -  gu(u.  o  -  lyv,  tu)]  ^ 


+  I  xw(uk  ■  \)(ui  -  V  +I  5ki(vk  -  vk)(vi  -  V 


k,£ 


k ,  L 


In  Eq.  (38),  the  sum  involving  terms  \^^(u^  -  u^)  (u^  -  u^)  is 

extended  over  a  subset  (not  necessarily  proper)  of  the  indices  of 

{u^};  i.e.,  =  0  unless  both  k  and  l  belong  to  some  subset  I  of 

the  indices  of  u.  However,  (\  )  is  assumed  to  be  positive  definite 

when  k,  a  are  restricted  to  I.  On  the  other  hand,  the  matrix  (§  ) 

is  assumed  to  be  positive  definite  with  k,  l  ranging  over  all  the 

indices  of  fv,  } . 

k 

Suppose  that  the  estimates  u,  v  are  obtained  by  finding  those 
values  which  minimize  Q(u,  v)  with  respect  to  u  and  v. 

Now  consider 


Q(u)=Y  T)  Is  -  g(u,t)-h(v,t) 
L  ii  v  L  u  u  ii  u  u 


(39) 


u  i  v 


x  i.s„  '  81>(u’  V  '  Vv-  %>] 


+  1  Ma(uk  '  V(ui  '  V 

k ,  l 
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where 

*  _  1  *  _  I 

T)  -  (T1  +  $  )  (40) 

and 

*  V  -1  d  hLi(v>  tu)  d  ^  (v,  t  ) 

§  =  )  F  - - EL_  - U - 11_  /in 

M.U  L  Sk£  a  V.  S  V  (41) 

k_i  k  i 

'A*  ^ 

(In  Eq.  (40),  T)  ,  T),  and  $  are  matrices.) 

Let  u  be  the  estimate  of  u  obtained  by  minimizing  Q  (u)  with 
respect  to  u.  Then, 

u  ■  u,  to  first  order  (42) 

It  is  to  be  noted  that  the  result  stated  in  Eqs.  (36)  -  (42) 
has  been  stated,  and  can  be  proved,  entirely  without  recourse  to 
statistical  concepts  or  interpretations.  Before  outlining  the 
proof,  however,  the  statistical  motivation  will  be  described: 

Suppose  we  now  regard  {e  j  as  a  random  process  with  means  zero 
and  inverse  covariance  matrix  (T^)  and  suppose 

(a)  a-priori  statistics  are  associated  with  some  of  the  components 

of  u,  having  means  and  inverse  covariance  matrix  (\  ) 

(b)  a-priori  statistics  are  associated  with  all  of  the  components 
of  v,  having  means  v^  and  inverse  covariance  matrix  (|^) 

(c)  {u^},  anc*  are  stati8tf-caHy  uncorrelated 
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Write  the  actual  observations  as 


S 

M. 


f  (u,  V, 

u 


V)  +  V  °  8u(u' 


t  )  +  h  (v  ,  t  )  +  € 

ML  Ml  M.  ML 


(43) 


g  (u,  t  )  +  h  (v,  t  )  +  e  +  /  (v  . 

P  ml  ml  ml  U.  L  k  k 


d  h  (v,  t  ) 
v,  )  - ^ - ii- 

'  -V 


B  v. 


_  b  h  (v,  t  ) 

If  we  regard  e  +  )  (v,  -  v  )  - l— 

ml  L  k  k  g  v 


as  the  "noise",  then  this  noise  will  be  a  random  process  with  mean 

•k 

zero  and  inverse  covariance  matrix  T]  .  Consequently,  Eq.  (42)  can 
be  interpreted  as  follows: 

Suppose  the  original  problem,  in  which  u  and  v  are  to  be 
jointly  estimated,  is  replaced  by  another  problem  in  which  u  only 
is  to  be  estimated;  v  is  eliminated  from  the  problem,  and  also  the 
virtual  observations  equivalent  to  the  a-priori  statistics  of  v  are 
eliminated. 

The  actual  observations  are  replaced  by 


where 


S 

ML 


t  )  +  e 

M,  ML 


f  (u,  t  )  =  g  (u,  t  )  +  h  (v,  t  ) 
ML  ML  ML  ML  ML  ML 


(44) 


(45) 


and  [e  }  is  a  random  process  with  zero  means  and  inverse  covariance 
★ 

matrix  7]  . 
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A* 

u  is  the  estimate  for  u  obtained  in  this  second  problem  (i.e., 
the  ML-MAP  estimate  if  conditions  k'  and  hold). 

Equation  (42)  states  that,  to  first  order,  if  A ‘  and  B7  hold, 

A 

the  ML-MAP  estimate  u  in  the  first  problem  is  equal  to  the  ML-MAP 

A* 

estimate  u  in  the  second  problem. 

The  proof  involves  some  rather  tedious  matrix  algebra.  An 
outline  is  as  follows: 


a)  If  the  functions  g  and  h  are  reasonably  well-behaved,  the 
estimate  u  can  be  obtained  as  follows  (assuming  we  can  effectively 
restrict  the  problem  to  the  immediate  neighborhood  of  u,  v) : 

First  find,  for  any  fixed  u,  the  value  v(u)  which  minimizes 
Q(u,  v)  with  respect  to  v. 

Then  let 


Q'(u)  -  Q^u,  v(u) 


(46) 


Then,  u  is  that  value  which  minimizes  Q/(u). 

The  proof  of  Eq.  (42)  then  consists  in  verifying  that,  to  first 

i  * 

order,  Q  (u)  *  Q  (u) . 


II. 3.  The  Linear  Case  with  an  Infinite  Set  of  Parameters 

The  primary  aim  of  this  subsection  is  to  apply  the  results  of 
Section  II.  2  to  the  linear  case  with  an  infinity  of  unknown  para¬ 
meters;  as  will  be  seen,  this  amounts  to  another  way  of  looking  at 
standard  linear  minimum-variance  filtering  theory.  First,  brief 
discussions  will  be  given  of  linearity  vs.  non-linearity,  and  of 

P 

finite  vs.  infinite  parameter  sets. 
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The  Linear  Case 

We  will  define  the  linear  case  to  be  that  case  in  which  the 
functions  f  depend  linearly  on  the  unknown  parameters.  In  general, 
this  may  be  written 


f  (x,  t  )  =  )  x  g  (t  )  +  h(t  ) 
p  p  i—i  x  pi  p  p 

i 


(47) 


where  g  (t)  and  h(t)  are  known  functions, 

px 

It  should  be  noted  that  the  virtual  observations  which  are 
equivalent  to  a-priori  statistics  are  of  the  form  given  by  Eq.  (1) 
and  are  therefore  automatically  in  the  linear  form.  Thus,  if  the 
functions  f  describing  the  actual  observations  satisfy  Eq.  (47),  then 
all  the  observations,  both  actual  and  virtual,  are  of  the  linear  form. 

In  such  a  case,  the  following  statements  can  be  made: 


(a)  The  estimates  {x^},  if  conditions  A1  and  B;  are  satisfied,  are 
exact  minimum  variance  unbiased  estimates.  If  only  is  satisfied, 
they  are  still  exact  minimum  variance  unbiased  linear  estimates. 

In  the  case  of  non-linear  dependence  of  f  on  the  parameters, 

P 

one  can  still  in  many  cases  say  that  the  estimates  {x^}  are 
asymptotical ly  minimum  variance ;  this  is  discussed  further  shortly. 


(b)  All  the  results  obtained  "to  first  order"  in  Section  II.  2  can 
now  be  said  to  hold  exactly  without  restriction  on  the  magnitudes 
of  le  }  and  of  [x.  -  x,).  Also,  the  matrices  B  and  partial  deriva- 

P  XI 

tives  d  f  |  )  x  are  independent  of  the  values  of  [x  }. 

pi  X  X 
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In  the  non-linear  case,  for  sufficiently  small  values  of  fe  } 

Vi 

and  of  [x^  -  x^},  the  problem  can  be  linearized  and  the  results 

obtained  to  first  order  will  be  approximately  correct. 

Actually,  in  many  cases,  this  linearization  will  lead  to  correct 

results  even  in  cases  where  the  individual  values  of  fe  }  and 

H 

[xi  -  x  ]  are  fairly  large,  provided  the  total  (integrated)  signal- 

(2) 

to-noise  ratio,  in  some  appropriately  defined  sense,  is  large. 
However,  there  are  some  subtle  pitfalls  connected  with  determining 
the  requirements  on  output  signal- to-noise  ratio  in  Oider  to  ensure 
that  the  results  obtained  from  the  linearized  problem  are  correct. 
This  is  discussed  at  some  length  in  Ref.  2. 

One  can  give  the  following  heuristic  condition  for  the  signal- 
to-noise  ratio  required  in  order  that  the  solutions  obtained  from 
the  linearized  problem  be  approximately  correct. 

Suppose  that  the  true  parameter  values  are  denoted  by  fx^,},  and 
that  there  exists  a  region  R  containing  x  *  fx^}  such  that,  for  all 
x' ,  x"  in  R,  and  all  p, 


f 

M- 


(x, 


+  remainder 


(48) 


where  the  remainder  term  is  negligible  within  R. 

Also  suppose  that  the  output  signal- to-noise  ratio  is  sufficiently 
high  so  that,  with  probability  approaching  unity,  a  preliminary 

estimate  x  can  be  obtained  with  x  e  R. 

o  o 

I 

m 
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Then,  with  probability  equal  essentially  to  unity,  one  can 
replace  the  original  problem  with  the  linearized  problem  in  which 
the  observations  are  replaced  by  S  -  f  (x  ,  t  ) ;  the  parameters  to 

hi-  O  H** 

be  estimated  are  replaced  by  lx,  -  x  >;  and  the  functions  f  are 

r  J  1  *.  1 J  u 

replaced  by 


* 

f 

P- 


3  f  (x 
P-  o 


3  x 


(49) 


Thus,  the  condition  is  that  the  signal  - to-noise  ratio  be 

sufficiently  high  that  the  problem  can,  with  probability  essentially 

equal  to  unity,  be  confined  to  a  region  R  around  x  where  the 

variation  of  f  with  (xj  is  linear  except  for  a  negligible  remainder, 

pi 


Finite  vs.  Infinite  Parameter  Sets 

In  typical  cases,  the  estimation  errors  due  to  observation 
errors  in  least-squares  smoothing  methods  of  the  kind  under 
discussion  increase  as  the  number  of  parameters  to  be  estimated 
increases.  In  many  cases,  as  the  number  of  parameters  to  be  estimated 
approaches  infinity,  the  estimation  errors  become  equal  to  the 
observation  errors  so  that  all  smoothing  is  lost. 

For  example,  suppose 


S(t)  =  x  (t)  +  e  (t) 


(50) 
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where  x(t)  is  some  function  of  time,  the  signal,  to  be  estimated,  and 
c(t)  is  the  noise.  If  we  allow  x(t)  to  be  represented  by  a  countable 
infinity  of  parameters  without  a-priori  statistics,  and  then  apply 
generalized  least-squares  smoothing,  the  resulting  estimates  are 

x(t)  =  S(t) 

and  the  estimation  error  is  just  e(t). 

Ordinarily,  one  gets  smoothing  by  fitting  x(t)  by  a  set  of 
functions  depending  on  a  jmall  number  of  parameters,  such  as  low- 
order  polynomials  or  trigonometric  series.  This  reduces  the 
estimation  errors  due  to  observation  noise,  but  if  the  actual 
functions  x(t)  do  not  belong  precisely  to  the  set  of  functions  used 
in  the  fitting  procedure,  another  kind  of  error  is  introduced  which 
is  sometimes  called  "bias  error"  (although  it  has  nothing  to  do  with 
biases  in  the  observation  errors). 

Usually,  the  procedure  is  "optimized"  by  choosing  the  number 
of  parameters,  e.g.,  the  order  of  the  polynomial  or  the  number  of 
terms  in  the  trigonometric  series,  so  that  the  sum  of  the  "bias" 
errors  and  the  errors  due  to  observation  noise  is  minimized.  This 
"optimization"  is  facilitated  if  one  has  some  sort  of  a-priori 
knowledge  as  to  how  closely  the  functions  x(t)  which  actually 
characterize  the  observations  can  be  approximated  by  functions  belong 
ing  to  the  set  used  to  fit  the  observations. 

According  to  the  viewpoint  adopted  here,  smoothing  can  be  re¬ 
tained  even  though  x(t)  continues  to  be  represented  by  an  infinite 
(countable)  parameter  set,  provided  these  parameters  are  given  a 


(51) 
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joint  a-priori  statistical  distribution.  This  is  equivalent  to 
adding  an  infinite  set  of  virtual  observations  which  is  sufficient  to 

retain  full  smoothing  even  with  an  infinite  number  of  parameters. 

/ 

Of  course,  another  way  of  looking  at  it  would  be  that  this  is 
equivalent  to  regarding  x(u)  as  a  random  process,  and  is  in  fact 
just  another  way  of  interpreting  the  standard  linear  minimum  variance 
filtering  in  which  the  signal  as  well  as  the  noise  is  regarded  as  a 
random  process. 

This  equivalence  will  be  made  explicit  in  a  moment.  However, 
it  would  be  well  to  mention,  at  this  point,  that  examples  can  be 
found  of  problems  in  which  there  are  an  infinite  number  of  unknown 
parameters,  but  in  which  smoothing  can  be  obtained  without  associating 
a-priori  statistics  with  any  of  the  unknown  parameters.  An  example 
of  this  sort  is  given  in  the  appendix. 

To  make  the  above- described  interpretation  or  linear  least- 
squares  filtering  explicit,  suppose 

S(t)  =  x ( t )  +  e  (t) ,  Tj  =  t  =  t2  (52) 

where  x(t)  and  e(t)  are  sample  functions  of  random  processes  [x(t)}, 
le(t)}.  It  is  assumed  that  one  knows  a-priori  that  [x(t)}  is 
defined  and  continuous  in  the  mean  over  an  interval  (T^ ,  T^)  contain¬ 
ing  (t  ,  t  ),  with  zero  mean  and  covariance  function  $  (s,  t);  while 
I  X 

(e(t)}  is  defined  and  continuous  in  the  mean  over  (t ^ ,  T^)  with  zero 

mean  and  covariance  function  $  (s,  t).  fx(t)}  and  le(t)}  are  assumed 

e 

to  be  statistically  uncorrelated. 


30 


Now,  let  x(t)  be  the  estimate  of  x(t)  obtained  from  standard 
linear  least  squares  theory  for  t  e(Tj,  T2>-  (If  t  e^,  t^)  we 
have  a  true  filtering  or  interpolation  problem,  otherwise  an  extrapola¬ 
tion  problem.) 

Now,  consider  the  following  equivalent  problem.  Suppose  the 
actual  observations  are  S(t),  =  t  »  .  Also  suppose  the  virtual 

observations  are  defined  as  follows: 

x(t)  =  "observed  value"  of  S^(t)  =0,  1^  *  t  *  (53) 

(since  our  assumption  is  that  (x(t)}  is  a  zero  mean  process). 

Virtual  observation  error  =  [e  (t)},  =  t  =  (54) 

r  * 

where  [e  (t) }  is  a  zero  mean  random  process  with  covariance  function 
$  (s,  t) .  (The  actual  observation  error  is  the  same  as  before.) 

X 

Let  x  (t)  be  the  generalized  least  squares  estimate  obtained 
for  this  equivalent  version  of  the  problem,  t  e(T^,  T2).  Then, 

* 

x  (t)  -  x ( t)  (55) 

This  is  actually  a  consequence  of  the  results  previously  proved; 

however,  a  direct  verification  is  possible.  This  can  most  simply  be 

obtained  by  considering  the  discrete-time  case  in  which  t  is 

restricted  to  discrete  values  [t  }.  The  estimates  for  the  continuous 

M- 

time  parameter  can  be  obtained  by  a  limiting  process  from  the 

(2,  14) 


discrete- time  results. 
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The  estimation  process  for  the  equivalent  least  squares  smooth¬ 
ing  technique  takes  the  following  form:  let 


Q(x)  -I  visV  • 

u  ,u 


(56) 


+ y  c  x  ( t )  x  c  t ) 

Lu  PU  P  U 


P,U 


where 


T]  =  $ 


-1 


(57) 


C  =  i 


-l 


(58) 


The  parameters  to  be  estimated  are  x  =  x(t  ),  t  e  (T  ,  T_). 

P*  P  P<  1  Z 

The  second  sum  in  Eq.  (56)  is  extended  over  the  whole  interval 
(T^,  T2) ,  while  the  first  sum  is  extended  over  only  those  t  in 
(t^,  t^),  as  indicated  by  the  prime  symbol.  The  parameter  estimates 

a* 

x  (t  )  are  obtained  by  minimizing  Q(x)  with  respect  to  x(t  ). 

P  p 

__  A 

The  fact  that  Eq.  (55)  holds,  where  x(t^)  are  the  estimates 

resulting  from  standard  linear  least  squares  filtering  theory,  can 

be  verified  directly  by  minimizing  Q(x)  with  respect  to  x(t  )  by 

P 

setting  the  partials  (5  Q  |  )  x  )  0  0,  and  comparing  the  results 

P 

with  the  standard  formulas  for  x(t  ),  as  for  example  given  in  Ref.  14. 
Incidentally,  insofar  as  concerns  estimation  of  any  particular 

value  x(t  ),  t  e (T. ,  T„),  the  estimate  x  (t  )  requires  only  that 

o  o  1  Z  o 

the  interval  over  which  the  random  process  [x(t)]  is  defined  contain 
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a* 

(T , ,  t  )  and  the  point  t  ;  x  (t  )  will  be  independent  of  whether 
[x(t)}  is  actually  defined  over  the  full  interval  (Tp  T£)  if  the 
latter  is  larger  than  (t^,  t ^ )•  This  is  also,  of  course,  true  for 
x(t  ). 

O 

Another  result  which  is  a  direct  consequence  of  the  results 
stated  at  the  end  of  Section  II.  2  is  as  follows: 

Let 

n 

S(t)  =  £  x(i)(t)  (59) 

i=l 

where  x^(t)  are  sample  functions  from  random  processes  (x^(t)} 
which  have  zero  means,  covariance  functions  $^(s,  t)  and  are 
mutually  uncorrelated. 

Suppose  the  indices  1=1,  ...,  n  are  divided  into  two  sets, 

I  and  I7,  and  that  the  processes  [x^(t)}  for  i  e  I  are  regarded 
as  "signals",  while  the  "noise"  is  given  by 

e(t)  =  £  x(i)(t)  (60) 

iel' 

Then,  the  estimate  x^(t)  for  a  particular  index  i  =  i  el 

O 

is  independent  of  I  -  i  .  That  is,  so  long  as  i  el,  the  estimate 

o  o 

x^(t)  for  i  ■  i  is  the  same  regardless  of  which  of  the  remaining 

O 

processes  x^(t),  i  i  i  ,  are  considered  as  signals  to  be  jointly 

O 

estimated,  and  which  are  lumped  into  the  noise. 
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We  can  even  go  to  the  extreme  of  considering 
i  *  1 ,  •  ••,  n,  as  signals  to  be  jointly  estimated, 
generalized  least  squares  formulation,  the  "actual 
would  then  be  considered  to  be  error-free  and  the 
be  associated  with  the  "virtual"  observations. 


all  of  the  processes, 
In  the  equivalent 
"  observations 
only  errors  would 
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III.  Recursive  Solutions  of  Signal  Estimation  Problems 

III.  1.  Preliminary  Discussion 

Suppose  one  has  a  signal  vector  x  =  (x, ,  .  .  .  ,  x  ),  which  may  be 

I  m 

a  function  of  time;  observational  data  which  depends  on  the  values 

of  the  vector  x;  and  additive  observation  errors.  Recursive  methods 

for  producing  the  generalized  least  squares  estimate  of  {x^}  at  any 

time  t  have  been  studied  by  Swerling,^’  ^  Kalman,  Bucy,^^ 

(13) 

Blum,  and  others.  These  solutions  have  the  feature  that  optimum 

estimates  based  on  previous  data  are  combined  with  additional 

observational  data  in  an  optimum  way  to  produce  new  optimum  estimates. 

(9  10) 

Swerling  ’  treated  initially  the  case  (either  linear  or 

non-linear)  where  the  vector  x  is  constant;  then  the  modified  case 
where  x  may  depend  on  time  but  where  the  variation  of  x  with  time 
has  known  functional  form;  and  finally  the  case  where  the  variation 
of  x  with  time  has  a  v.  rponent  of  unknown  functional  form,  but 
without  dealing  with  the  case  where  the  unknown  component  of  the  time 
variation  of  x  is  associated  with  a-priori  statistics  (i-e.,  is 

regarded  as  a  random  process). 

„  ,  (11.  12)  „  „  (12)  .  ..  . 

Kalman  and  Bucy  treat  the  linear  case  with 

(12) 

uncorrelated  observation  errors,  and  also  give  the  extension  to 

the  case  where  x  is  regarded  as  a  random  process,  with  essentially 

the  assumption  that  x  is  a  vector  Markov  process  (though  its 

individual  components  need  not  be  Markov  processes). 

(13) 

Blum  generalizes  these  recursive  methods  to  cases  where 
the  observation  errors  are  correlated  in  ways  not  treated  by  the 
other  papers  mentioned. 
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It  is  the  purpose  of  this  section  to  exhibit  recursive  solutions 
for  the  case  where  x  is  a  random  process,  with  very  few  restrictive 
assumptions  on  the  statistics  of  x  or  of  the  noise  process. 
Essentially,  the  signal  processes  are  assumed  to  be  continuous  in 
the  mean;  the  noise  process  is  assumed  to  consist  of  a  component 
which  is  continuous  in  the  mean  and  a  white  noise  component;  and 
that  is  all.  Section  III.  2  treats  the  linear  case  with  additive 
noise;  and  Section  III.  3  gives  the  "first  order"  treatment  of  the 
non-linear  case,  where  some  of  the  noise  may  be  non-additive. 

The  method  of  approach  is  as  follows:  all  problems  of  this 
type  are  reduced  to  an  equivalent  problem  in  which 

(a)  There  is  a  (possibly)  infinite  set  of  parameters  to  be  estimated 

(b)  The  parameters  to  be  estimated  are  regarded  as  constants  (inde¬ 
pendent  of  time) 

(c)  A-priori  statistics  are  associated  with  the  parameters  to  be 
estimated  and  are  represented  in  the  generalized  least  squares 
procedure  in  the  form  of  equivalent  virtual  observations. 

(d)  The  observation  errors  are  regarded  as  uncorrelated,  i.e.,  have 
covariance  function  $(s,  t)  =  R(t)  6(s  -  t) .  This  is 
accomplished  by  regarding  everything  except  the  "white  noise" 
component  of  the  observation  error  as  represented  by  parameters 
to  be  estimated. 

When  the  problem  has  been  reduced  to  the  form  described  by 

(9) 

(a)  -  (d) ,  formulas  of  Swerling  can  be  applied  directly.  The 
requisite  formulas  will  be  reproduced  here,  in  the  form  applicable 
to  the  discrete-time  case.  The  result  for  a  continuous  time  parameter 
is  then  obtained  by  a  limiting  process. 
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Suppose  we  have  observed  data  given  by  Eq.  (12)  above.  Suppose 

the  matrix  (T1  )  can  be  broken  up  into  blocks.  The  fundamental 

(9) 

result  of  Swerling  is  that  a  recursive  procedure  can  be  set  up, 

in  which  the  observational  data  corresponding  to  each  block  of  T\  is 

treated  as  a  separate  stage;  at  each  stage,  say  the  s*"^,  a  generalized 

th 

least  squares  smoothing  of  the  observation  data  in  the  s  stage, 

together  with  the  estimates  based  on  all  previous  stages,  is  defined. 

The  basic  result  is  that  this  sequence  of  generalized  least  squares 

estimates  can  be  defined  in  such  a  way  that  the  resulting  estimates 

(say,  after  the  s^  stage)  are,  to  first  order,  identical  with  those 

resulting  from  the  non-recursive  smoothing  of  all  "s"  stages  using 

the  original  matrix  7].  In  the  linear  case,  the  qualifying  phrase 

"to  first  order"  can  be  dropped.  The  specific  form  of  the  necessary 

recursive  sequence  is  exhibited. 

The  results  assume  a  particularly  simple  form  if  the  original 

smoothing  matrix  T\  is  diagonal  (or  at  least  is  diagonal  after  some 

point).  In  this  case,  each  observation  S  can  be  considered  a 

Vi 

separate  stage.  We  will  refer  to  this  as  introducing  the  observations 
one -by -one . 

Before  exhibiting  the  formulas  necessary  for  the  ensuing  applica¬ 
tion,  a  few  comments  are  in  order  about  the  interpretation  of  these 
stagewise  or  recursive  procedures.  The  most  important  comment  is 

(9) 

that  Swerling' s  basic  result  is  completely  non-statistical ,  that 
is,  it  can  be  stated  and  proved  without  recourse  to  statistical 
notions;  it  holds  regardless  of  the  statistics  of  the  observation 
error,  and  in  particular,  of  whether  the  basic  smoothing  matrix  T\  is 
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or  is  no  the  inverse  covariance  matrix  of  the  observation  errors. 
Consequently,  even  if  the  errors  are  correlated,  one  is  still  at 
liberty  to  use  a  diagonal  T)  and  thus  to  use  one-by-one  recursive 
me  thods . 

Although  the  basic  result  is  non-statistical ,  all  the  usual 
statistical  consequences  can  be  derived  from  it  when  specific 
statistics  are  associated  with  the  observation  errors.  For  example, 
if  conditions  A  and  B  of  Section  II  hold,  the  result  of  the  non¬ 
recursive  method  are  ML  estimates,  and  consequently  the  result  of 
the  recursive  method  are  to  first  order  the  ML  estimates. 

If  the  original  weighting  matrix  7]  is  not  the  inverse  covariance 

matrix  of  the  observation  errors,  then  the  accuracy  of  the  resulting 

estimates  will  be  degraded  (the  estimates  will  not  b  statistically 

(2) 

optimum) ;  the  amount  of  accuracy  degradation  can  be  computed  and 
may  be  an  acceptable  price  to  pay,  for  example,  for  the  computational 
convenience  of  using  a  diagonal  matrix  T]- 

(13) 

Blum  generalizes  these  recursive  estimation  procedures  so 
that  they  can  be  applied  without  loss  of  statistical  optimality  to 
certain  cases  where  the  observation  errors  are  correlated  and  where, 
in  fact,  there  is  no  way  to  break  up  their  covariance  matrix  into 
blocks.  His  approach  is  to  assume  that  the  observation  errors 
satisfy  a  non-horaogeneous  difference  equation  of  order,  say,  k;  his 
recursive  procedure  then  involves  the  previous  k  +  1  estimates  instead 
of  just  the  last  estimate. 

The  approach  to  be  followed  here  (summarized  above  in  statements 
a  -  d)  is  applicable  to  either  correlated  or  uncorrelated  observation 
errors,  and  results  in  all  cases  in  statistically  optimum  estimates 
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<V  ■  \  V, 


Let  x  ®  (x. ,  x  )  where  x,  are  constants.  Denote  by  x, (s)  the 

1  m  i  l 

estimate  of  x^  based  on  the  first  s  observations.  (We  will  also 
assume  that  there  is  some  initial  estimate  x^(°)  but  need  not  specify 
at  this  point  just  how  this  is  obtained.) 

Then,  the  one-by-one  recursive  procedure  is  defined  as  follows, 
where  x(s)  “[^(s),  •••»  xm(s)~|: 

x(s)  -  x (s  -  1)  =  D^jjc(s  -  1)J  p*[x(s  -  1) 


D(s)[x  1  =  D 

L  J 


(X)  -  d^'(x) 


d«(x)  .fa  l 


l  fs(X’  V  D(S-1) 
a  ^  ik 


(x)} 


(under  the  appropriate  conditions  A  and  B^.  The  necessary  equations 
are  equivalent  to  Eqs.  (23),  (25),  (46) ,  (47),  (48)  of  Ref.  9. 

Thus,  suppose  the  matrix  (Tj  )  is  diagonal: 


j  ,k»l 


3  f  (x,t  )' 
s  s 
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(65) 


*  *  (s) 

In  the  above,  x,  p  ,  and  d  are  m-component  vectors;  D  is  an 

m  x  ra  matrix;  x  is  the  time  of  the  s^  observation, 
s 

The  initial  values  of  x  and  D,  i.e.,  x(o)  and  have  not 

been  defined;  but  this  will  be  done  when  the  intended  application 
is  made. 

(s)  (s) 

In  the  linear  case,  the  matrices  D  and  the  partials  Q  f  /  d  x^ 
are  independent  of  the  values  of  x  or  x(s). 


III.  2.  Application  to  the  Linear  Case 

We  will  assume  that  the  observational  data  up  to  any  time 
instant  T  is 


S(t) 


n- 1 

I 


i=l 


+  €  (t), 


(66) 


where  u^(t)  is  a  sample  function  of  a  random  process  |u^^(t)j-. 
The  processes  j^u^^(t)j-  will  be  interpreted  as  signal  processes  and 


-^e  (t)j  as  a  noise  process 

The  following  assumptions  will  be  made  about  these  random 
processes : 

There  is  some  basic  interval  (T^,  T£>  containing  (t  ,  t)  within 
which  it  is  known  a-priori  that: 
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(a)  |u(1)(t)|,  i  =  1,  ....  n  -  1,  are  random  processes  which  are 
mutually  uncorrelated;  have  zero  means;  and  are  continuous  in  the 
mean  with  covariance  function  $^(t,  t/). 

(b)  -je  (t)|  is  a  random  process  which  is  uncorrelated  with  all  the 

r  (i) 

processes  -^u  (t)j,  and  which  can  be  written 


{e*(t)}  =  {&(t)j  +{e(t) 


(67) 


where  j&(t)j-  and  ^e(t)j  are  mutually  uncorrelated,  zero  mean, 


and: 


(i) 


(Mt)} 


is  continuous  in  the  mean  with  covariance  function 


*(n)(t,  t'). 


(ii)  je(t)j-  is  a  "generalized  white  noise"  process. 

The  statement  about  -^e(t)j-  can  be  interpreted  in  various 
alternative  ways: 

1.  |e(t)j  has  covariance  function  R(t)  6(t  -  t') 


2.  If  we  define  (more  rigorously)  another  random  process  by 

t+At 

3 (At  ,  t)  =  J  e(T)  d  t 


(68) 


then  in  the  neighborhood  of  t,  P(At,  t)  has  covariance  function 


E^0(At,  t)  fJ(At7,  t)J  =  R(t)  min(At,  At7) 


(69) 
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If  R(t)  “  constant,  then  j^e(t)^  has  one-sided  spectral  density 


N  =  2  R 

o 


(70) 


It  will  be  assumed  that  for  t  e(T^,  T2),  R(t)  is  positive 
and  bounded  away  from  zero.  On  the  other  hand,  the  non-white  noise 
component  may  vanish. 

Now,  if  the  non-white  noise  component  6(t)  does  not  vanish, 
we  will  define 


|u(n)(t)}  =  {6(t)j-  (71) 

and  write 

n 

S (t)  =  £  u(1)  (t)  +  e  ( t)  (72) 

i=l 


(If  {u(n)(t)}  vanishes,  the  sum  in  Eq.  (72)  simply  extends  to  n  -  1 
instead  of  n.  )  Incidentally,  the  assumption  that  the  processes 
|u(i)(t)j  are  mutually  uncorrelated  is  not  really  essential;  it 
will  be  made  clear  how  the  same  technique  could  be  applied  in  the 
absence  of  this  assumption. 

Now,  it  will  be  assumed  that  the  quantities  to  be  estimated 

are  u^(t),  t  e(T^,  1^),  i  =  1,  ...,  n.  (Actually  one  is  really 

only  interested  in  i  =  1 ,  . ..,  n  -  1,  but  the  technique  calls  for 

th 

treating  the  non-white  noise  component  as  if  it  were  an  n  signal 
component  to  be  estimated.) 


Let 
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u(1)(t,  t) 


linear  minimum  variance  estimate  of 
u^(t)  based  on  the  observational  data 


from  t  to  T,  for  any  t  e  (T^ ,  T2). 


The  actual  observed  data  are  given  by  Eq.  (72).  The  virtual 
observations  are  given  by 


(73) 


_(i) 

u  (t)  -  0,  i  =  1,  n,  t  e^,  T2) 


(74) 


and  the  virtual  observation  errors  are  random  processes  having  the 
same  statistics  as 

The  solution  for  u^(t,  t)  will  be  obtained  by  first  treating 
the  discrete- time  "case  and  later  passing  to  the  limit. 

Thus,  let  jt  j-  be  a  set  of  equally  spaced  time  points  in 
(Tx,  T2): 


A  t 


(75) 


t  =  T 
1  1 


Let  be  a  set  of  equally  spaced  time  points  in  (T^,  t) 


s+1 


A  t 


(76) 


The  parameters  to  be  estimated  are  u^^(t^).  The  actual 
observations  are  given  by  Eq.  (72)  with  t  restricted  to  the  points 
The  virtual  observations  are  given  by  Eq.  (74)  with 
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t  restricted  to  the 

are  a  subset  of  \  t 

l  p. 

It  is  assumed  that  the  "zero^"  stage  of  the  recursive  procedure 
is  based  only  on  the  virtual  observations,  that  is,  on  the  a-priori 
statistics  of  •lu^(t)}.  Thus,  let 


u(i)  (t  ,  s)  =  estimate  of  u^(t  )  based  (77) 

on  the  first  s  actual  observations 
and  the  a-priori  statistics 


u(i)(t  ,  0) 


_(i) 

u  (t  ,  0) 

u 


—  0 


(78) 


The  various  covariance  functions  $^(t,  t#)  become  covariance 
matrices  $^*^(t  ,  in  the  obvious  manner.  The  only  point  that 
needs  discussion  is  the  manner  in  which  the  white  noise  process 
e(t)|  is  to  be  represented  in  the  discrete-time  case.  The  correct 
representation  is 


;  (t  ,  t  )  = 

e  u  u 


R(t  ) 

6  - 


y.u  At 


where  $  is  the  covariance  matrix  of 
e 


{•v}- 


(79) 


The  solution  for  the  discrete-time  case  then  consists  of 
applying  Eq.  (62)  -  (65).  This  is  straightforward,  the  main  difficulty 
being  in  keeping  all  the  indices  straight.  Here,  the  parameter 
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vector  x  ■  (x.  ,  x  )  hay  components  u^(t  )  arranged  in  some 

L  m  p 

order.  Thus,  m  >  >  n  in  general. 

The  i;eault  is  as  follows.  It  should  be  noted  that  the  indices 
i,  j,  k,  l  in  the  following  run  over  (1,  ....  n) ;  thus,  they  do  not 
have  quite  the  same  meaning  as  in  Eqs.  (62)  -  (65).  In  fact,  the 
indices  i  appearing  in  Eqs.  (62)  -  (65)  correspond  to  pairs  of 
indices  (i,  p)  in  the  following.  It  is  best  to  regard  the  following 
equations  as  self-contained;  Eqs.  (62)  -  (65)  were  merely  repro¬ 
duced  to  indicate  the  method  of  derivation,  and  the  notation  of 
Eqs.  (62)  -  (65)  is  not  necessarily  completely  consistent  with  that 
of  the  following  . 

u(1)(t  ,  s)  -  u^(t  ,  s  -  1)  (80) 

P  P 

n  n 

■  ITT)  [S(V  •  I  S<J)(V  8  •  °]  I  cik(V  V  8) 

8  j-1  k*l 

U(i)(t  ,  0)  -  0,  all  i,  p  (81) 

P 


s  -  1) 


(82) 


-  d  (t  ,  s)  d  (t  ,  s) 
i  M-  j  v 
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W  s) 


n 


k=l 


1) 


(83) 


x 


i 


A  t  1 

R(TS)J 


I  W  V  S  -  !)} 

j,k=l 


cii(V  L  »  0) 

ij  P'  u 


(84) 


The  case  where  the  random  processes  -|u^(t)j  are  mutually 
correlated  is  obtained  simply  by  changing  Eq.  (84)  to 


W  V  0) 


t  ) 
u 


(84  a) 


where,  denoting  a-priori  expected  value  by  E( 


), 


$(i,j)(t  , 

M- 


(85) 


We  will  now  go  to  the  limit  by  assuming  that  A  t  0.  Then 
we  get 


5 

a  t 


_i _ 

R(T) 


k= 

* 


[s (t)  -  £  G<j)(T,  T)] 

J-i 


C  ik  C  t ,  T,  T) 


(86) 
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A 

u 


(i) 


(t,  Tj) 


C,  all  i  and  all  t  e(T^,  T2)- 


n 


[Cij(t’  t'>  T)]  "  "  R(t)  {  I  Cik(t’  T>  T)  Cj£(t#’  T>  T)} 

k,£=l 


(87) 


C^(t,  t',  tx)  =  6ij  $^(t,  t') 


(88) 


for  all  i,  j  and  all  t  and  t'  in  (T^,  l£) 


More  generally,  if 


{u(i)(t)j 


are  mutually  correlated, 


'ij 


(t,  t',  tx)  -  $(i’j)(t,  t') 


(88  a) 


In  the  above,  u^(t,  t)  is  the  estimate  of  u^(t), 
t  e(T^,  T2),  based  on  the  observational  data  from  to  t.  The 

indices  i,  j,  k,  I  as  previously  stated  run  from  one  to  n. 

In  the  linear  case  which  has  been  treated  in  this  section,  the 
estimates  u^(t,  t)  are  precisely  the  minimum  mean  square  error 
estimates  of  u^(t),  based  on  actual  observations  up  to  t. 

The  functions  C^(t,  T)  have  the  following  interpretation 
(assuming  the  equivalent  of  conditions  A*  and  B*  of  Section  II 
apply): 
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c±j  (t,  t',  T)  =  E  [u(i)(t,  t)  -  u(i)(t)J  (89) 

x  ^u^^(t/,  t)  - 

It  is  useful  to  have  the  discrete- time  formulas,  Eqs.  (80)  -  (84), 
since,  in  the  first  place,  in  many  applications  the  observed  data 
will  be  at  discrete  times;  and  second,  even  in  the  continuous  time 
case,  the  solutions  to  Eqs.  (86)  -  (88)  would  generally  be  built  up 
from  a  difference  equation  approximation.  Since  the  possible 
difference  equation  approximations  are  non-unique,  Eqs.  (80)  -  (84) 
indicate  the  best  one  (cf.  especially  Eq.  (83),  of  which  the  term 
in  braces  disappears  when  A  t  -  0). 


III.  3.  Non-Linear  and  Non-additive  Application 

Suppose  the  actual  observation  data  is  given  by 


S(t)  =  f[t,  u(1)(t),  ...,  u<"\t)]  +  c  (t) ,  Tl  =  t  =  t 

where  are  mutually  uncorrelated  continuous- in- the-mean 

random  process  over  (T^,  T^),  with  covariance  functions 
$^(t,  t7)and  means  zero;  ^e(t)j-  *-s  a  generalized  white  noise 
process  with  covariance  function  R(t)  6(t  -  t*),  uncorrelated  with 
the  processes  ^u^(t)j-,  and  with  zero  mean. 


(90) 
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It  is  assumed  that  some  of  the  processes  ^u^(t)|  may  be 
considered  "signal"  and  others  "noise"  from  the  point  of  view  of  any 
particular  application;  our  technique  calls,  however,  for  all  of 
them  to  be  jointly  estimated.  Also,  some  of  them  may  be  additive 
and  others  not. 

The  recursive  generalized  least  squares  estimates  u^(t,  t) 
can  also  be  derived  for  this  case  by  means  of  Eqs.  (62)  -  (65).  To 
make  the  first  order  approximations  valid  for  the  non-linear  case  we 
must  now  assume  that  at  any  time  t  ^  there  exist  estimates  of 
u(i)(t)  which  have  small  errors.  We  will  assume  that  such  estimates 
are,  in  fact,  given  by  u^(t,  T).  In  the  discrete- time  case  this 
becomes  u^  (t  ,  s)  . 

p, 

In  short,  we  will  assume  in  the  discrete- time  case  that 

G(i>(t  ,  s)  -  u(i) (t  )  is  sufficiently  small  so  that 
P*  P* 

f[t,  u(1)(t) . U(n)(t)]  (91) 

•  f[t,  s(1)(t.  s),  ....  s)J 

♦  i  h 

i-1 


[t,  U 


(1) 


(t,  s), 


u 


(n) 


(t,  s)  j^u 


(i) 


(t) 


u(i)(t,  s) 


+  negligible  remainder. 
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In  the  continuous- time  case,  simply  replace  u^(t,  s)  by 


u(i)(t,  t) 


In  the  non-linear,  non-additive  case,  the  resulting  generalized 

least  squares  estimates  can  no  longer  necessarily  be  stated  to  be 

minimum  mean  square  error  estimates.  However,  they  can  still  be  said 

to  be  asymptotically  minimum  mean  square  error  if  Eq.  (91)  holds. 

The  main  difference  between  this  and  the  linear,  additive  case 

previously  discussed  is  that  the  functions  C^j  now  depend  on  estimates 
(k) 

of  u  (t),  k  =  1,  . . . ,  n.  The  necessary  modification  of  Eqs.  (80)  -  (84) 
or  Eqs.  (86)  -  (89)  are  as  follows: 

In  the  discrete-time  case,  writing  as  usual  u  =*  (u^\  ...»  u^), 


A 

u 


(i) 


%  * (i)  , 

s)  -  uv  '(t  , 


S  -  1) 


(92) 


A  t 
R(ts) 


x 


n 


I 


cik(V  V  s) 


5  f 
3  \ 


(t  ,o) 
P- 


0,  all  i  and  p, 


(93) 


C. .(t  ,  t  ,  s)  =  C,  .  (t  ,  t  ,  s  -  1) 
ij  p,  v  ij  p,  v 


(94) 


-  d  (t  ,  s )  d  (t  ,  s) 

ip.  j  v 
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d  (t  , 

i  p. 


s) 


[  R(0  ] 


(95) 


1  + 


"At 

Lr(ts)J 


n 


I 


W 


s  -  1) 


f  r 

u .  _ 


a  f  | 

B  LTs’ 


W  V  °>  ■  6ij  *<0<v  «„> 


(96) 


When  A  t  -*  o,  we  get 


a  t  [u(i)(t»  T)]  =  r^F)  {s<t>  -  fjj,  G(t,  t>]  | 


(97) 


n 


I  cik(t-  T>  T>  Hr  [T-  “<T>  *>] 

k-1  k  J 


:(i) 


u  (t,  o)  =  0,  all  t  edj,  T2) 


(98) 


a 

a  t 


[  V‘. *'•  0 


(99) 


n 


R(t)  Z  Cik(t,T’T)  Cj£(t#’T’T>  |~  [T»  u(t,t)]  [t,G(t,t) 

k  k  X 


Cij(t,  t',  o)  -  6  $(1)(t, 


(100) 
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As  before,  If  the  assumption  is  dropped  that  ^u^^(t)|  are 
mutually  uncorrelated,  then  Eq .  (98)  and  Eq.  (100)  are  replaced  by 
Eqs.  (84  a)  or  (88  a)  respectively. 


IV.  Further  Problems 

Several  areas  of  further  research  are  suggested  by  the  foregoing. 
Two  specific  areas  of  useful  investigation  are,  first,  use  of  the 
recursive  framework  to  treat  problems  calling  for  adaptive  estimation 
methods,  and  second,  investigation  of  techniques  for  obtaining  exact 
(not  merely  asymptotic)  minimum  mean  square  error  estimates  in  the 
non-linear  case. 


Adaptive  Estimation  Methods 

As  used  here,  an  adaptive  estimation  problem  refers  to  one  in 
which  the  a-priori  statistics  of  the  observation  errors,  or  the 
statistics  of  the  signals  if  these  are  regarded  as  random  processes, 
are  not  known  exactly. 

The  generalized  least  squares  methods  described  above, 
especially  in  their  recursive  form,  may  provide  a  convenient  frame¬ 
work  for  treating  such  problems,  as  has  been  recognized  by  many 
workers  In  this  field. 

For  example,  considering  the  problems  treated  in  Section  III, 
suppose  that  the  covariance  functions  §^(t,  t/)  and  the  function 
R(t)  are  not  known  exactly.  How  would  one  modify  the  recursive, 
generalized  least  squares  procedure  to  incorporate  a  feature  whereby 
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these  covariance  functions  are  estimated  from  the  data  and  these 
estimates  then  incorporated  in  the  procedure?  One  possible  avenue 
of  approach  is  as  follows. 

Suppose  the  estimation  procedure  is  initiated  simply  by  assuming 
some  set  of  function  ^\t,  t /)  and  R(t)  to  insert  into  the  recursive 
equations.  As  previously  discussed,  the  resulting  recursive  estimates 
are  still  identical  (to  first  order)  to  some  set  of  generalized  least 
squares  estimates.  However,  these  estimates  in  effect  are  obtained  by 
minimizing  a  function  Q  in  which  the  matrix  T)  is  not  the  true  inverse 
covariance  matrix  of  the  observation  errors  (actual  and  virtual). 

However,  this  may  still  result  in  reasonably  good  estimates,  even 

though  they  will  not  be  the  best  possible.  Moreover,  the  machinery 

(2) 

exists  by  which  it  is  possible  to  compute  the  estimation  error 
covariances  as  a  function  of  the  deviation  between  the  true  covariance 
matrices  and  the  assumed  ones. 

Now,  this  procedure  will  result  in  estimates  u^^(t,  t)  of  the 
random  variables  u^^(t).  Also,  it  can  be  used  to  produce  estimates 
c(t,  t)  of  the  "white  noise"  component,  since  e(t,  t)  = 

S(t)  -  f£t,  u(t,  t)].  (In  the  continuous  time  case,  the  estimate 
e(t,  t)  would  ha^e  to  be  interpreted  as  an  estimate  of  some  suitably 
smoothed  version  of  the  white  noise,  since,  technically,  the  white  noise 
has  infinite  variance  at  a  single  instant.) 

Next,  the  estimates  u^^(t,  t)  and  £(t,  t)  can  be  used  to  estimate 
the  covariance  functions  $^\t,  t ')  and  R(t).  This  statement  is  clear 
in  case  the  random  process  |u^^(t)|  and  je(t)^-  are  stationary,  in  which 
case  estimates  of  their  covariance  functions  can  be  made  from  a  single 
time  sample. 
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If  they  are  non- s tationary ,  it  is  still  possible  to  get  estimates 
of  the  covariance  functions  from  a  single  time  sample,  but  in  general 
these  would  be  very  poor.  However,  if  these  unknown  processes  can 
be  considered  to  be  simple  transformations  of  stationary  processes, 
such  as  integrals  of  stationary  processes,  then  the  covariance  function 
can  be  more  accurately  estimated  from  a  single  time  sample. 

It  still  remains  to  specify  how  one  would  incorporate  the 
resulting  estimates  of  the  covariance  functions  (or  possibly  some 
composite  estimates,  depending  both  the  covariance  functions  assumed 
a-priori  and  those  estimated  from  the  data)  into  the  overall  estimation 
procedure . 

Since,  in  the  recursive  procedure,  the  functions  $^(t,  t ;)  enter 
into  the  procedure  only  via  the  initial  condition  equations  (84), 

(88),  (96),  or  (100),  one  approach  would  be,  periodically,  to  go  back 
and  solve  for  the  functions  C..(t,  t' ,  t)  over  again,  using 

A 

/n  a 

J(t,  t;,  t)  and  R(t,  t)  in  place  cf  the  initially  assumed 
'(t,  t'),  R(t)  in  these  equations.  Here,  f  and  R  refer  to  co- 
variance  function  estimates  making  use  of  data  up  to  time  T.  The 
resulting  recomputed  values  of  C^(t,  t,  t)  would  then  be  used  from 
that  point  on  in  Eqs.  (80),  (86),  (92),  or  (97).  In  part,  this  would 
detract  from  the  recursive  feature,  since  it  would  involve  re-solving 
for  C^(t,  t'l  t).  However,  it  still  preserves  the  recursive  feature 
insofar  as  processing  of  new  observational  data  is  concerned  (at 
least,  this  is  true  in  the  linear  case),  since  it  does  not  require 
any  re- processing  of  the  old  observational  data. 
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Insofar  as  the  function  R(t)  is  concerned,  it  would  appear 
necessary  to  make  some  sort  of  assumption  that  R(t)  is  slowly  varying. 
Some  specific  problems  to  be  investigated  are 

a)  Computation  of  the  degree  to  which  the  accuracy  of  the  estimates 

u^'\t,  t)  is  degraded  if  the  initially  assumed  5^(t,  t/)  and  R(t) 

differ  from  the  true  covariance  functions  of  -ju'^Coj-  and  ■^e(t)j>. 

(2) 

The  machinery  for  this  already  exists. 

b)  Computation  of  the  degree  to  which  the  initially  assumed  functions 
can  be  improved  on  the  basis  of  the  observed  data,  whether  and  under 

A 

what  conditions  the  resulting  estimates  $  and  ft  actually  approach  the 
true  covariance  functions,  and  if  they  do,  how  rapidly  as  T  increases. 

c)  Devising  computationally  convenient  ways  of  incorporating  im¬ 
proved  estimates  of  the  covariance  functions  into  the  procedure. 


Exact  Minimum  Mean  Square  Error  Estimates 

Even  if  conditions  A/  and  B*  are  satisfied,  the  generalized 
least  squares  estimation  procedures  do  not  give  the  precise  minimum 
mean  square  error  estimates  for  the  non-linear  case.  However,  it  is 
known,  at  least  formally,  what  the  exact  minimum  mean  square  error 
estimates  are.  They  are  the  estimates  formed  by  finding  the 
expected  values  of  x^  with  respect  to  the  a-posteriori  p  d  f  of 
(x  ,  X£,  •••)  based  on  the  observational  data.  In  the  notation  of 


Section  II.  1, 
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x 


i 


|  S)  xi  d  x1  d  x2  . . . 


where 


S) 


P (S  j  X  ^ ,  X2  >  •  •  •  )  P  (x ^  1  X2 ,  •  ■  • ) 

J(S) 


In  Eq.  (101),  x  now  represent  the  exact  minimum  mean  square 

r~  2 

estimates,  i.e.,  those  which  minimize  E|^x^  -  xj  among  all  possible 
estimates  x^. 

Now,  in  some  cases  the  mean  of  p(x^,  X2,  •••  |  S)  occurs  for 
the  same  values  of  x^  as  the  maximum.  In  these  cases,  the  MAP 
estimates  are  the  minimum  mean  square  error  estimates.  This  is  true 
in  the  linear  case.  However,  in  general  it  is  not  the  case. 

If  one  attempts  to  utilize  Eqs.  (101)  and  (102)  in  the  non¬ 
linear  case,  even  for  Gaussian  additive  noise,  one  quickly  gets  into 
analytically  intractable  problems. 

Incidentally,  the  exact  minimum  variance  unbiased  estimates  can 

»  (4) 

be  obtained  from  Barankins  method,  the  application  of  which  to 
stochastic  processes  is  analytically  tractable  up  to  a  point. 

However,  in  general,  the  minimum  variance  unbiased  estimates  are  not 
the  minimum  mean  square  error  estimates. 

Further  investigation  of  analytically  or  computationally  tractable 
ways  to  use  Eqs.  (101)  and  (102)  would  be  very  useful. 


(101) 


(102) 
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A  reasonably  tractable  solution  can  be  given  in  the  following 
special  category  of  cases: 

As  usual,  let  x  *  (x^,  x^,  ...)•  Suppose 

S(t)  «  £(t,  x)  +  e(t),  o  =  t  a  T 


where  -|e(t)j-  is  a  Gaussian  process  with  mean  zero. 

Let  be  a  finite  set  of  points  in  (o,  t).  Let  (T^)  be  the 

{'  (V>}- 


inverse  covariance  matrix  of 


Also,  let  x^(T)  be  the  exact  minimum  mean  square  error  estimate 
of  x^  based  on  the  observations  over  (o,  T). 


Then, 


*i<T>  ■  {f 


exp 


x 


%  A  +  b]  x±  p(x)  d  xj 
[  -  \  A  +  bJ  p(x)  d  x| 


where  d  x  B  d  x^  d  x^,  ...,  p(x)  is  the  a-priori  joint  p  d  f  of 
>  x2 j  • • • y  3nd 


A  -  lim  Y  \  f(t  ,  x)  f(t  ,  x) 

4-i  PU  (I  y 


1  V  £(V  x)  S(V 


(103) 


(104) 


(105) 


B 


lim 


(106) 
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and  the  limits  in  Eqs.  (105)  and  (106)  are  understood  in  the  sense 
that  j-  becomes  a  dense  set  in  (o,  T).  Reference  2  gives  several 
ways  of  evaluating  expressions  of  the  forwlim  ^  g(t^)  h(t^)- 

C  'i 

In  the  special  case  that  je(t)j-  is  a  generalized  white  noise 
process  with  covariance  function  R(t)  6(t  -  t7), 


A 


T 


R(t) 


d  t 


(107) 


T 

B  =  f  f(t,  x)  S(t)  d  t  (108) 

J0  R(t) 


Use  of  these  expressions  still  has  the  great  disadvantage  that 
the  functional  form  of  the  integral  over  x  that  must  be  performed  in 
Eq.  (104)  depends  on  the  observed  data.  This  can  be  avoided  in  some 
cases,  however. 

For  example,  suppose 

f(t,  x)  =  F (t)  H(x)  (109) 

We  then  have 

A  «=  H2 (x)  lim  £  F(t^)  F(t^)  (110) 

B  =  H(x)  lim  £  7^  F(t^)  S(tu)  (111) 

Here,  A  =  A(x,  T)  does  not  depend  on  the  observed  data  and  can  thus 


be  pre-computed. 
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Thus,  x^(T)  takes  the  form  in  this  case 
^i  (T)  -  G±  [u(T>] 


G^(u)  “  {  [  exP  "  %  A(x,  T)  +  H(x)  u  x^  p(x)  d  xj- 


X  J  expj^-  %  A(x,  T)  +  H(x)  uj  p(x)  d  x} 


u(T>  -  I  V  FV  S(V 


In  the  white  noise  c&se, 

■<*>  ■  (  £%fM  <• 

The  advantage  of  this  is  that  the  only  thing  that  depends  on  the 
observed  data  is  u(T).  Thus,  the  function  G^(u)  can  be  precomputed 
(i.e.,  the  integral  over  x  can  be  performed  prior  to  any  observed 
data,  regarding  u  as  a  variable);  and  the  only  thing  necessary  to 
take  account  of  the  observed  data  is  to  compute  u(T)  and  insert  the 
computed  value  into  G^. 

This  can  obviously  be  extended  to  the  class  of  cases  where 

l  F<k>  (t)  H<k>  (x) 
k*=l 


(112) 


(113) 


(114) 


(115) 


f(t,  x) 


ms 


(116) 
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Then,  A(x,  T)  still  remains  independent  of  the  observed  data,  and 


B  -  l  H(k)(x)  {  Urn  l  \1,F(k)(tii)  S(tv)} 


Therefore, 


A  . 

xi(T) 


=  Gi  [u(1)(T),  ...,  u(K)(T>] 


(117) 


(118) 


where  is  given  by  Eq.  (104)  with  B  given  by  Eq.  (117)  and 


u(k)(T) 


U"  I  V  F(k) 


(t  )  S(t  ) 

p,  u 


(119) 


which,  in  the  white  noise  case,  becomes 


00 


-  »««  -  jo  14^  - « 

The  function  G^u^\  ...,  u^^J  can  be  precomputed  regarding 

oo 

u  as  variables;  and  the  observed  data  taken  into  account  by 

OO 

computing  the  values  of  u  and  inserting  them  into  G^. 

It  is  conjectured  that  the  estimation  method  defined  by  Eq.  (104) 
may  give  good  results  in  many  cases  even  if  ^e(t)|  is  not  a  Gaussian 
process.  That  is,  an  estimation  method  which  would  be  the  exact 
minimum  mean  square  error  estimate  if  |e(t)}  were  Gaussian,  may  still 
be  a  very  good  estimate  in  some  case  where  -^e(t)j-  Is  not  Gaussian. 

Also,  there  may  be  cases  (possibly  in  orbit  estimation,  for  example) 
where  these  methods  are  more  computationally  convenient  than  usual  ones 


(120) 


based  on  least  squares. 
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Other  Applications 

A  number  of  applications  can  be  thought  of  to  problems  in  which 
there  are  various  mixtures,  all  in  the  same  problem,  of  parameters 
with  which  a-priori  statistics  are  associated  and  those  with  which 
no  a-priori  statistics  are  associated;  random  processes  involving 
mixtures  of  infinite  sets  of  unknown  parameters  and  additional 
finite  sets  of  parameters;  and  mixtures  of  discrete- time  and 
continuous- time  observational  data  or  other  heterogeneous  types  of 
observational  data.  The  foregoing  results  provide  a  systematic 
framework  for  treating  large  classes  of  these  problems,  including 
setting  up  recursive  solutions. 

The  problem  treated  in  Appendix  A  with  its  variations  provides 
one  set  of  examples.  Numerous  other  specific  applications  can 
readily  be  thought  of. 
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Appendix  A.  Estimation  of  Rotation  Rate 

This  problem  is  introduced  as  an  example  of  a  problem  in  which 
a  smoothed  estimate  of  a  parameter  can  be  obtained,  even  though  there 
are  an  infinite  number  of  other  unknown  parameters,  and  even  though 
no  a-priori  statistics  are  associated  with  the  unknown  parameters. 

Suppose  a  radar  target  is  rotating  about  a  fixed  axis.  The 
returned  signal  amplitude  is  observed  by  the  radar,  and  the  object  is 
to  estimate  the  rotation  rate  <jj-  The  radar  cross-section  of  the 
target  will  be  assumed  to  vary  with  viewing  aspect.  The  estimate  of 
uj  is  to  be  made  only  by  observing  the  fluctuations  in  amplitude  of 
the  returned  signal  (and  not  by  means  of  spreading  of  the  doppler 
spectrum,  for  example). 

It  is  assumed  that  nothing  is  known  a-priori  about  the  form  of 
the  radar  cross-section  vs.  aspect  (with  two  minor  exceptions  to  be 
noted  below) . 

The  received  signal  is  assumed  to  be 

S(t)  =  a(u)  t)  +  e(t)  (A  1) 

where  ^e(t)j-  is  a  white  no* sc  process  with  one-sided  power  spectral 
density  Y. 

If  we  write 

0  =  uj  t  (A  2) 


it  is  assumed  that 
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(i)  a(0)  is  periodic  with  period  2tt/(d  but  not  with  any  smaller  period. 

(ii)  0  c/(0)  is  square  integrable  over  (o,  2tt). 

Other  than  this,  no  knowledge,  statistical  or  otherwise,  is 
assumed  about  o. 

Now,  let  be  any  complete  set  of  orthonormal  functions 

in  L2(o,  2tt).  Then  we  can  write 


CD 

or (0 )  ■  Y,  ai 

i=l 

Thus,  the  received  signal  is 


S(t)  -  f  (t,  u>,  ofj,  a2»  *  *  *  >  +  ®(t>  »  °  =  t  =  T 


where 


f(t,  oj,  or1 ,  a2,  ...)  =  2,  a±  P*C“>  ^ 

i=l 


We  now  wish  to  find  Ejji  -  ujj  for  the  maximum  likelihood 
estimate  £  of  rotation  rate.  The  approach  will  be  to  apply  the 
formulas  of  Ref.  2  for  the  white  noise  case.  This  approach  leads 
to  the  following. 


£ 


B 

oo 


(A  3) 


(A  4) 

I 

i 

j 

(A  5) 


with  asymptotic  equality, 
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where 


2 

Y 


T 

P  L_L  i_L 

V  ^  ^  Xj 


d  t 


(A  7) 


In  Eq.  (A  7),  i  =  0,  1,  2,  . . . 

j  “  0,  1,  2,  ... 

x0  =  « 

xi  ■  <v  i  =• 0 

Now,  for  convenience,  suppose  the  observation  time  T  is  such 
that  it  is  an  integral  number  of  periods: 


T  s  2TTJ? 
U) 


(A  8) 


.  Then 


B 

oo 


2tt  N 

Jo  [9*'<9>]  49 


2n  N 


B 


i°  w  2 
Y  u> 


J  e  o'ce)  p1(©5)  d  0,  i  >  o 


(A  9) 


(A  10) 


Bij  ■  H  -  i-  j  >  °  <a  u> 

One  can  now  compute  Bq^  by  truncating  the  matrix  B  to  an  n  X  n 
“  1  th 

matrix;  finding  Bqo  for  this  n  order  matrix,  and  then  letting 
n  -*  co.  The  result  is 
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-  [b0o  - 1  bio  bu] 


-l 


(A  12) 


1=1 


Using  Eq.  (A  9)  -  Eq.  (A  11),  this  results  in 

,2 


r—  _  i. 

S  tu  -  U) 


(A  13) 


...  3  e  -2-n  Nr  -.4  _  .  r  -in  « 

2  Y-  { f  [e  d 9 -  I  i  [  f  9  °'(9)  pi(9)  d  •] ) 


2TT  n 


,2.-1 


1=1 


or,  using  Eq.  (A  8),  an  equivalent  form  is 


(A  14) 


*  2~  (f3)  {  J  [®  °/(N  9)]  d  9 "  I  [  J  ®  °/(N  e)  pa(n  0)  d  e]  } 

°  1=1  ° 


It  might:  be  noted  from  Eq.  (A  13)  or  Eq.  (A  14)  that  e| Jo  -  & 
for  N  =  1,  i.e. ,  for  T  less  than  or  equal  to  one  period.  That  this 
is  what  should  happen  is  clear. 


CO 
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Now,  suppose  we  compare  Eq.  (A  13)  or  Eq.  (A  14)  with  the 
answer  to  be  obtained  when  o(Q)  is  a-priori  known  exactly.  It  might 
be  noted  that  in  practical  cases,  even  if  the  nature  of  the  scattering 
object  were  well  known,  a  realistic  assumption  would  have  the 
absolute  amplitude  and  the  initial  phase  of  rotation  as  unknown 
parameters.  That  is,  a  realistic  model  would  have 

S(t)  =  a  a  [u>(t  -  tQ)J  +  e(t)  (A  15) 

with  a,  u>,  and  t  unknown  a-priori. 

In  the  case  where  a  was  assumed  to  be  completely  unknown, 
treated  above,  it  was  unnecessary  to  assume  additional  unknown  para¬ 
meters  a  and  tQ  since  this  was  automatically  taken  care  of  by 
assuming  that  all  the  in  Eq.  (A  3)  were  unknown. 

(2) 

The  formulas  for  treating  Eq.  (A  15)  are  simple  to  apply. 

However,  for  present  purposes,  let  us  make  the  somewhat  unrealistic 

assumption  that  uj  is  the  only  unknown  parameter.  (The  answer  will 

then  be  a  lower  bound  for  the  case  where  cy,  m,  and  t  are  all 

o 

unknown.)  Thus,  instead  of  Eq.  (A  15),  we  will  assume 

S(t)  ■  o(u)  t)  +  e(t)  (A  16) 

where  a  is  a  known  function. 


Then, 
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2tt  N 

r  r 

LJ 


o 


Q  a'Ce)]2 


.-1 


d  9 


(A  17) 


1 

2  T 


r2 


9  <j'(N  ©) 


d  9 


"1 


-1 


Now,  finally,  consider  the  case  where  o-(0)  is  considered  to 
be  partly  known.  The  realistic  formulation  would  be 


S(t)  = 


cr  ( u>  t) 
a 


Ot  CTb 


o>(t  -  t 


,)] 


:(t) 


(A  18) 


where  a  is  completely  unknown;  a,  is  known;  and  a,  u),  t  are 
a  bo 

unknown. 


It  turns  out,  however,  that  if  the  equation  for  E 


0)  -  U) 


is 


applied  in  this  case,  without  any  further  assumption  of  a-priori 

knowledge,  the  answer  comes  out  exactly  the  same  as  for  the  case 

first  treated,  that  is,  the  case  where  the  total  function  a  is 

entirely  unknown.  This  is  also  true  if  a  and  t  are  assumed  known. 

o 

The  reason  is,  of  course,  that  once  a  is  considered  to  be  completely 

3 

unknown,  that  is,  that  the  coefficients  in  the  expansion  of  n 

3 

relative  to  |p^(9)}  can  anything,  this  is  equivalent  to  saying 
that  the  expansion  coefficients  of  a  +  crK  can  be  anything. 

Thus,  it  turns  out  that  the  problem  where  some  portion  of  &  is 
known  cannot  be  properly  formulated,  in  such  a  manner  as  to  reflect 
the  benefit  of  partial  knowledge  of  cr,  without  associating  a-priori 
statistics  with  the  unknown  part  of  cr.  For  greater  simplicity, 
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suppose  we  assume  that 


where 


S(t)  -  ca(aj  t)  4-  c^(u)  0  +  c(t) 


"a(6)  "  I  “1  V9) 

i=l 


ab(e)  =  Z  Bi  pi(e) 

i=l 

and  where  all  0  are  known  a-priori. 

Suppose  that  no  a-priori  statistics  are  associated  with  tu. 
However,  suppose  a  (9)  is  known  to  be  a  random  process  having  known 

cl 

mean  and  covariance  function  over  (0,  2tt).  Since  the  mean  of  a  (©) 

a 


is  assumed  known,  we  can  replace  o’  (0)  by  a  (©)  -  a  (•)  and  assume 

Si  8  S 

the  mean  to  be  zero. 

Now,  {V*>}  has  heretofore  been  considered  a  completely 

arbitrary  complete  orthonormal  set  in  0,  2n.  At  this  point,  we  will 

make  the  following  specific  choice  of  the  orthonormal 

eigenfunctions  of  the  covariance  function  f  (q,  9 ')  of  the  random 

a 

process  |aa(9)  |  over  (0,  2tt), 

In  that  case,  the  set  have  a-priori  statistics 


*  o,  all  i 


“i  “j  =  Ai  6ij 


2 

where  A  are  related  to  the  eigenvalues  of  the  kernel  ♦  (9,  0  ;) 
L  & 


(A  19) 


(A  20) 


(A  21) 


(A  22) 
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We  now  have  the  actual  observations  given  by 

S(t)  —  f(t,  id,  o<^ »  c<2»  •  •  •  )  s(t) 

00 

f(t>  u)>  ^  =  ^  (a^  +  P1(o»  t) 

i=l 


8^  are  all  known  constants. 

The  virtual  observations  are  given  by 


(A  23) 


I 

r 

(A  24) 


c^e)  =  +  6  uj£e)  (A  25) 

(  g)  _  (g) 

where  the  observed  values  of  ^  are  ctj^  0  and  the  errors  6 
have  covariance  matrix  given  by  Eq.  (A  22). 

If  we  now  apply  the  generalized  least  squares  smoothing 
technique  to  the  set  of  observations  consisting  of  both  the  actual 
and  th*1  virtual  observations,  we  obtain  the  result  that  eJjj}  -  u, 
is  given  by  Eqs. (A  6),,  (A  9),  (A  10),  (A  12),  and  a  modified  version 
of  Eq.  (A  11): 

I 

(A  11a) 


Also,  in  applying  these  formulas, 

<t(0)  =  CTa(9)  +  crb(e) 


I 

(A  26) 


5 


B 


ij 


2 

7 


G +  Ai2] 


ij 


(i,j  >  1) 


69 


Caution  must  be  used  In  interpreting  this  result,  since  it 
actually  amounts  to  getting  a  lower  bound  on  E  u>  -  u>  relative  to  the 
fictitious  statistical  ensemble  of  the  actual  and  virtual  observations. 
As  pointed  out  in  Section  II,  the  statement  that  this  can  be  considered 


equivalent  to  E  «  -  relative  to  the  original  statistical  ensemble 
(defined  by  the  statistics  of  -le(t)w  and  the  a-prlorl  statistics  of 
|c(0)  ‘  has  been  proved  only  If  certain  first  order  expansions  are 
valid.  Since  we  are  dealing  here  with  a~ highly  non-linear  case,  this 


amounts  to  saying  that  the  result  for  E  A  -  ©J  stated  above  (for  the 
case  where  a-prlorl  statistics  are  associated  with  a  portion  of  rr($)) 
has  only  been  proved  if  we  know  c  (0)  pretty  well  originally. 

(This  Is  not  to  say  that  it  might  not  be  possible  to  prove  the 
equivalence  under  more  general  conditions  In  special  cases.) 

In  any  event,  the  main  purpose  here  Is  not  so  much  to  derive 
the  formulas  for  E!  uj  -  m  i  as  to  give  an  Illustration  of  a  problem 
Involving  an  Infinity  of  unknown  parameters.  In  which  smoothed 
estimates  can  be  obtained  without  associating  a-prlorl  statistics 
with  any  of  the  parameters;  but  for  a  slightly  modified  form  of  the 
problem,  a  proper  solution  cannot  be  obtained  without  associating 
a-prlorl  statistics  with  some  of  the  unknown  parameters. 
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